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ANALYSIS  OF  THE  FINITE  PRECISION  S'-STEP  BICONJUGATE 

GRADIENT  METHOD 

ERIN  CARSON  AND  JAMES  DEMMEL 

Abstract.  We  analyze  the  5-step  biconjugate  gradient  algorithm  in  finite  precision  arithmetic 
and  derive  a  bound  for  the  residual  norm  in  terms  of  a  minimum  polynomial  of  a  perturbed  matrix 
multiplied  by  an  amplification  factor.  Our  bound  enables  comparison  of  5-step  and  classical  biconju¬ 
gate  gradient  in  terms  of  amplification  factors.  Our  results  show  that  for  5-step  biconjugate  gradient, 
the  amplification  factor  depends  heavily  on  the  quality  of  5-step  polynomial  bases  generated  in  each 
outer  loop. 


1.  Introduction.  Krylov  subspace  methods  (KSMs)  are  a  class  of  iterative  al¬ 
gorithms  commonly  used  to  solve  the  linear  system  Ax  =  b.  In  classical  KSM  imple¬ 
mentations,  in  iteration  n,  the  updates  to  the  solution  Xn+i  and  residual  r„+i  consist 
of  one  or  more  sparse  matrix-vector  multiplications  (SpMVs)  and  vector  operations 
in  each  iteration.  On  modern  computer  architectures,  the  performance  of  these  oper¬ 
ations  is  communication-bound]  the  movement  of  data,  rather  than  the  computation, 
is  the  limiting  factor. 

Communication- avoiding  KSMs  (CA-KSMs),  based  on  s-step  formulations  (  [4, 
6,  8,  12,  27,  23,  24]),  reduce  the  total  communication  cost  by  a  factor  of  0(s)  by 
performing  0(s)  computation  steps  per  communication  step  (see,  e.g.,  [3,  7,  13]).  This 
asymptotic  reduction  in  communication  cost  yields  significant  speedups  in  practice  for 
many  problems  [19]. 

Although  CA-KSMs  are  mathematically  equivalent  to  their  classical  counterparts, 
their  finite  precision  behavior  may  differ.  It  has  been  empirically  observed  that  the 
rate  of  convergence  of  CA-KSMs  deviates  further  from  the  convergence  of  the  classical 
method  as  s  increases,  and  that  the  severity  of  this  deviation  is  heavily  influenced  by 
the  polynomials  used  for  the  s-step  Krylov  bases  (see,  e.g.,  [3,  13,  14,  1]). 

In  this  work,  we  derive  Lanczos-type  matrix  recurrences  governing  the  s-step  bi¬ 
conjugate  gradient  method  (BICG)  in  finite  precision  arithmetic,  which  demonstrates 
the  algorithm’s  relationship  to  classical  BICG.  Using  the  recurrence,  we  extend  the 
results  of  Tong  and  Ye  for  classical  BICG  [25]  to  derive  an  upper  bound  on  the  norm  of 
the  updated  residual  in  hnite  precision  s-step  BICG.  Our  bound  provides  an  analytical 
explanation  for  commonly-observed  convergence  behavior  of  s-step  BICG. 

2.  Related  work.  We  briefly  outline  the  available  literature  on  relevant  topics, 
namely  the  analysis  of  KSMs  in  finite  precision  and  s-step  KSMs. 

2.1.  Analysis  of  finite  precision  Krylov  methods.  There  are  two  primary 
effects  of  roundoff  error  in  finite  precision  KSMs:  the  maximum  attainable  accuracy 
of  the  solution  is  decreased,  and  convergence  may  deteriorate.  Much  research  has 
been  devoted  to  better  understanding  this  behavior,  and  to  devise  more  robust  and 
stable  algorithms. 

An  upper  bound  on  the  maximum  attainable  accuracy  for  finite  precision  KSMs, 
limited  by  the  deviation  of  the  Lanczos  residual  from  the  true  residual,  was  obtained 
by  Greenbaum  [10].  Greenbaum  proved  that  this  bound  can  be  given  a  priori  for 
methods  like  CG,  but  cannot  be  predetermined  for  methods  like  BICG,  which  can 
have  arbitrarily  large  intermediate  iterates.  There  are  also  techniques  for  alleviating 
this  loss  of  accuracy,  namely,  residual  replacement  strategies,  where  the  computed 
residual  is  replaced  by  the  finite  precision  evaluation  of  the  true  residual  at  carefully 
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chosen  iterations  (see,  e.g.,  [22,  26].  In  this  way,  agreement  between  the  true  and 
computed  residual  is  maintained  to  within  a  factor  of  0(e). 

In  [11],  Greenbaum  proved  backward  stability  of  the  Hnite  precision  CG  algorithm, 
by  showing  that  the  computed  Ritz  values  lie  in  small  intervals  around  the  eigenvalues 
of  A.  There  are  many  other  analyses  of  the  behavior  of  various  KSMs  in  finite  precision 
arithmetic  (see,  e.g.  [17,  16,  25]).  The  reader  is  also  directed  to  the  bibliography  in 
[20]. 

Our  analysis  is  most  closely  related  to  the  work  of  Tong  and  Ye  [25] .  The  authors 
derived  a  bound  for  the  residual  norm  of  classical  BICG  in  finite  precision,  expressed 
as  the  product  of  a  minimum  polynomial  of  a  perturbed  matrix  and  an  amplihcation 
factor.  Our  analysis  generalizes  the  work  of  Tong  and  Ye  to  the  s-step  BIGG  case. 

2.2.  s-step  Krylov  subspace  methods.  The  first  instance  of  an  s-step  method 
in  the  literature  is  Van  Rosendale’s  s-step  CG  [27].  Van  Rosendale’s  implementation 
was  motivated  by  exposing  more  parallelism  using  the  PRAM  model.  Chronopoulos 
and  Gear  later  created  the  s-step  GMRES  method  with  a  similar  goal  [5] .  Walker  used 
s-step  bases  to  improve  stability  in  GMRES  by  replacing  the  modified  Gram-Schmidt 
orthogonalization  process  with  Householder  QR  [28].  These  authors  used  monomial 
bases,  and  found  that  convergence  often  could  not  be  guaranteed  for  s  >  5.  It  was 
later  discovered  that  this  behavior  was  due  to  the  inherent  instability  of  the  monomial 
basis,  which  motivated  research  into  the  use  of  other  bases  for  the  Krylov  subspace. 

Hindmarsh  and  Walker  tried  a  scaled  monomial  basis  to  improve  convergence  [12], 
but  saw  only  minimal  improvement.  Joubert  and  Carey  implemented  a  scaled  and 
shifted  Chebyshev  basis  which  led  to  more  accurate  results  [14].  Bai  et  al.  improved 
convergence  using  a  Newton  basis  [1].  Although  successively  scaling  the  basis  vectors 
can  lower  the  condition  number  of  the  basis,  this  computation  reintroduces  commu¬ 
nication  dependencies.  Hoemmen  solved  this  using  a  novel  matrix  equilibration  and 
balancing  approach  as  a  preprocessing  step,  which  often  alleviated  the  need  for  scaled 
basis  vectors  [13]. 

Hoemmen  et  al.  [7,  13,  19]  derived  communication-avoiding  variants  of  Lanczos, 
Arnoldi,  CG  and  GMRES.  We  use  communication-avoiding  to  specihcally  refer  to 
s-step  variants  implemented  using  the  communication-avoiding  matrix  powers  ker¬ 
nel,  which  applies  to  well-partitioned  sparse  matrices  (see,  e.g.,  [18]).  Derivations 
of  communication-avoiding  variants  of  nonsymmetric  Lanczos-based  KSMs,  such  as 
BIGG,  CGS,  and  BICGSTAB  can  be  found  in  [3]. 

2.3.  s-step  BICG.  We  briefly  review  s-step  BICG  for  solving  Ax  =  b,  where 

A  G  (see  Alg.l).  Note  that  this  overview  is  meant  for  the  familiar  reader;  in 

the  interest  of  space,  we  defer  to  numerous  other  works  on  the  topic,  such  as  [3,  4,  5, 
7,  13,  15,  27,  24].  For  simplicity,  we  assume  A  is  full  rank. 

Throughout  the  remainder  of  the  paper,  denotes  a  zero  matrix  of  size  i  x  £ 
and  Oi  is  a  column  vector  of  i  zeros.  We  use  I  to  denote  the  square  identity  matrix; 
dimensions  are  either  given  as  a  single  subscript,  or  are  implicit  from  context.  We  use 
Ci  to  denote  the  column  of  appropriately  sized  I. 

In  each  outer  loop  k  of  s-step  BICG,  we  generate  Krylov  bases  with  the  current 
search  direction  and  residual  vectors,  Psk  and  Vsk,  which  we  denote  as  having 
basis  length  s  -I-  1,  and  having  basis  length  s.  The  basis  vectors,  or  columns  of 

Yi  =  \vu,---yJ  ,  are  generated  by  the  three-term  polynomial  recurrence 


Vfc,,+i  =  l^{.A-  ej) 


(2.1) 
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Algorithm  1  s-step  BICG 
1;  a;o,  ro  =  fo  =  b-  Axq,  po  =  Po  =  ro,  k  =  0 
2:  while  not  converged  do 
3:  Calculate  Yl,  VI,  Cf ,  Vl 

4:  Zfc  =  [Vl,  Vl],V,  =  [Vl,  Vi],  Gk  =  vlVk 

p'kfi  ~  [^>  “  [Ol,s+li  Ij  Oi_s_l]  ,  xI  q  =  [O2S+1] 

P'k,0  ~  [1>  0l,2s]^)  ^  =  [Oi_s+i,  1,  Oi_s_i]^ 

7;  for  j  =  0  :  s  —  1  do 

8:  ask+,  =  m^,VG,rl^)/iipl^rG,B,pl^) 

9:  4,i+i  = 

10:  rl  j^^  =  -  Bfc  (^a,k+3P'k,j) 

11^  r'k,j+i  =  r'k^j  -  Bk  {a^k+3P'kj) 

10:  Pk,3  +  1  ~  l’fc,j  +  l  Psk+j+lPkJ 

14*  PkJ-\-l  j  +  1  d~  ^sk+j+lPkJ 

15;  end  for 

16;  Xgk+s  =  ^k^k,s  ~k  ^sk,  I’sfe+s  =  ^k^k,s,  Psk+s  =  ^kPk,s 

17;  fsk+s  =  Vk^k^s,  Psk+s  =  ^kP'k,s 

18;  k  =  k  +  1 

19;  end  while 
20;  return  Xgk 


with  starting  vector  g  =  Psk-  We  assume  we  use  the  same  recurrence  in  constructing 
vl  j.  The  choice  of  parameters  7^,  9i,  and  Ui  play  a  large  role  in  determining  the  quality 
of  the  resulting  basis,  which  in  turn  affects  both  stability  and  convergence  in  s-step 
BICG.  We  denote  Vk  =  [Vl,Vk]-  We  also  denote  Vk  =  Oat,  I4’',  O^v]  where 
and  Vk  are  Vfl  lesp.,  with  their  last  columns  omitted. 

Within  the  inner  loop,  in  step  j  of  outer  loop  k,  we  update  the  length-(2s  -I- 
1)  coefficients  for  the  BICG  vectors  as  linear  combinations  of  the  columns  in  Vk 
(rather  than  explicitly  update  the  length- A  BICG  vectors,  as  in  classical  BICG).  The 
coefficient  vectors  are  denoted  with  prime  symbols  (i.e.,  Vgk+j  =  Vk^k  j,  and  similarly 
for  Psk+3  and  Xgk+j)-  The  inner  iteration  updates  then  become 


where 


A, 3+1  =  r'k,3  -  ask+jBkP'kj  and 
Pk,j+1  —  I’fcj'-l-l  +  Psk+3  +  lPk,j, 


Bk  = 


Gk^s+i  Os+14 


[  Ck,. 


Ck,3  = 


9o  -cri/71 

1/70  9i 
1/71 


-V-i/V-i 

9j-i 

l/7i-i 


(2.2) 

(2.3) 


with 
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We  can  rearrange  (2.2)  and  (2.3)  as 

B-kPkj  =  (rkj  ’’/cj+i)  and 

(2.4) 

CUsk+j 

^k,j  Pk,j  ^sk+jPkJ—1' 

(2.5) 

Premultiplying  (2.5)  by  P;.,  we  obtain 

^kTk,j  ~  ^kPk,j  ~  Psk+jVkPkJ—l- 

(2.6) 

This  equation  is  valid  for  1  <  j  <  s,  since  pj,  _i  is  undefined.  When  j  =  0,  we  have 

=  Vk-l^k-l,s 

=  {p'k-l,s  ~  f^skP'k-l,s-l) 

=  ^kp'kfl  ~  [iskVk-lp'k-l^s-li 

which  gives  a  valid  expression  for  the  j  =  0  case. 

Now,  let 

^kj  ■  ■  ■  ’  j]  and  ^kj  \Pk,0'’Pk,i'>  ■  ■  ■  ■ 


We  can  write  (2.6)  in  block  form  as 

^kRk,j  —  ^kPk,jRk,j  ~  PskVk-lPk-l,s-l^l  ; 

where 

r  1  —Psk+i  1 


l^sk+j 

1 

Premultiplying  (2.7)  by  A,  we  obtain 

AVkR'k,,  =  AVkPi,Uk,j  -  PskAVk-ip'k-i,,-ieJ. 
We  can  also  write  (2.4)  in  block  form  as 

^kPk,j  ~  Rk,jPk,jAk,j  ~  T  ^fc,i+l®j  +  l! 

0^sk-\-j 

where  Akj  =  diag  (ask,  ■  ■■ ,  ctsk+j)  and 


Lk,j 


1 

-1  1 


-1  1 


If  we  premultiply  (2.9)  by  Pj,  and  postmultiply  by  Ukj,  we  obtain 

Yik^kPkjPkJ  =  y-kRk,jPk,jAk,j^k,j  ~  T 

^sk-\-j 


(2.7) 


(2.8) 


(2.9) 
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which  can  be  written 


^^kPkjUk,j  =  VkR'k.iLk,]-^k^iUkJ  — 


1 


Y-k'^k,j+l^j+l 


(2.10) 


since  AVk  =  Vk^k  9-^^  Y-kR-'k  j  —  ^kR'k  j  for  j  <  s  —  1.  We  can  then  combine  (2.8) 
and  (2.10)  to  obtain 


AVkR'k,,  =  VkRl.tk,,  -  -^Vk-yk-i,s-iel  -  -^Vyk,j+ieJ+i,  (2.11) 

o  o  ask-1  ask+j  ■’ 

where  Tk  i  =  Lk  iAz]Uk  i  +  ei  eT.  Note  when  k  =  0,  -IAk-  jg  defined  to  be  0. 

k,J  ^■’J  -'-Qsfc-l  ’  Oisk-1 

We  can  now  combine  outer  loop  iterations  in  block  form  to  write  the  s-step  BICG 
recurrence  for  iterations  0  through  sk  +  j.  Let  Vk  =  [Vb,  Vi, . . . ,  14].  Let 


1 


r: 


•0,s-l 


and 


'^k,j  — 


R'k,j  - 


J_  _ih 

ao  Q:o 

_  J_  J_  _L  ^ 

ao  Ql  Q:o 


Rls-i 


R[ 


k,j  J 


Ctsk+j-l 

I  l^sk+j 


Then  by  (2.11),  we  can  write 


1 


^'^kR-kJ  ~  ^kRk,jRk,j  Vk'''k,j  +  l^sk+j+l- 

ask+j 

Since  we  can  write  the  residual  vectors  as  TZn  =  [ro,...,r„]  =  VkR'kj,  where 
n  =  sk  +  j ,  we  can  write  the  above  as 

ATZ-ji  —  'Rn'T'n  lln+l^ri+l^ 


which  gives  us  the  same  governing  equation  for  iterations  0  through  sk  +  j  as  the 
classical  BICG  algorithm  in  exact  arithmetic  [25].  Note  that  a  similar  relation  holds 
for  the  dual  Krylov  vectors  fsk+j  and  Psk+j  ■ 

3.  s-step  BICG  in  finite  precision.  The  goal  of  this  section  is  to  derive  a 
Lanczos-type  recurrence  for  finite  precision  s-step  BICG  of  the  form 

AVkRk,j  =  '^k'R'k+Tkj  —  -  +  eAkj 

^sk-\-j 

and  upper  bound  the  size  of  the  error  term  eAkj.  We  assume  a  standard  model  of 
floating  point  arithmetic,  where 


fl  {ax  +  y)  =  ax  +  y  +  Si, 
fl  (4a;)  =  Ax  +  62, 


where  [di]  <  e2  jax]  -f  \y\  +  0{e^),  and 
where  [(521  <  \A\  \x\  +  O(e^), 
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where  x,y  G  a  €  M.  In  the  remaining  analysis  we  drop  higher  powers  of  e  for 
simplicity.  Let  e  be  the  machine  precision  unit.  For  simplicity  of  notation,  we  now 
let  ask+j,  rk,a,  Pk,s,  (isk+j,  Zfc,  and  be  the  computed  quantities  in 

finite  precision  s-step  BICG. 

At  the  {sk  +  iteration,  to  compute  we  first  compute  Bf^p'/,  j  and  have 

fl  {BkP'kj)  =  HkP'kj  +  9.  where  1^1  <  e(2s  +  1)  |  \  . 

Then 


+  l  ^  (j^kJ  kXsk+j  '  fl  (^I^kPk,j')') 

~  "^kj  ~  k^sk+j]^kPk,j  ~  k^sk+j9  T  9  )  (3-1) 

where  \g'\  <  e{\r'^^j\+2\ask+j\\B^p'k,j\)  ■  Let  =  {ask+j9  +  9')  /  {^\ask+i\)-  Then 
using  (3.1)  we  obtain 


(rfcj+i  -  =  -BkP'kj  +  e^r 


where 


Similarly, 


.  I  <  (2s  +  1) \B^.  I  \p'i.j  I  + 


+  2  RkP'k, 


Pk,j  +  1  —  fl  (^fc,i+l  +  l^sk+j  +  lPk,j) 

=  ^fc.i+1  +  Psk+j  +  ip'k,j  +  /) 

where  |/|  <  e  (  +  2  \Psk+3+i\  p'k,j  )•  Letting  =  //e,  we  have 


P'k,j+i  —  ^fc.i+i  +  l^sk+j+ip'k,j  +  eVfc  i+i’ 


where 


<  V'k  d+1  I  +  2|/3sfe+j+i  I  \Pk,j  I  ■ 

Rearranging  (3.2)  and  (3.3),  we  can  write 

RkP'kj  =  ('’fcj  -  r'k,j+i)  +  ,  and 

^k,j  ~  Pk,j  ~  l^sk+jPk,j-l  T  ^^P'k,j  ’ 
and  premultiplying  (3.4)  by  14  gives 

Lfcrfcj  =  Vkp'kj  -  l3sk+jVkp'k^j_i  +  eVk6p>^^. 


(3.3) 


(3.4) 


This  equation  is  valid  for  1  <  j  <  s,  since  p).  _i  is  undefined.  When  j  =  0,  we  have 


Lfc^fc.o  —  /^(Xfc-i^fe-i,s) 

=  V^-ir'k-i^s  +  <^^k-i  and 
LfcPfcp  =  fKy-k-lP'k-l,s) 

=  Vk-iP'k-i,s  + 
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where  \(j)l_^\  <  (2s  +  l)|Zfc-i | and  <  (2s  +  l)|Zfe-i | |Pfc-i,J-  Then  for 

j  =  0,  we  can  write 

=  Z/c-l  (p'k-l,s  ~  Pskp'k-l,s-l  + 

=  14Pfc,o  -  #Li  -  PskVk-ip'k-i,s-i  +  eZfc-iV,_i,,  + 

=  ^kp'kfi  —  PskVk-lp'k-l,s-l  +  e  ,  +  4>\-l  —  ■ 


Now,  let  Afl' 

k,j 

then  write 


and  Ap' 

k,j 


02s+i,(5p^_^, . . . 


We  can 


^kRk,j  ~^kPk,jUk,j  PskVk-lPk-l,s-l^l  + 

+  e  (Zfc-i  +  4>1-1  -  4>l-^  ef  and  (3.5) 

PkPk,j  =R'k,jPk,j^kJ  ~  ~  ^k,j  +  l^J+l  +  ^^R'k  ,•  (^■®) 

^sk-\-j 

Premultiplying  (3.5)  by  A  gives 

^^kRk,j  —^^kPkjUkJ  —  l3skAVk-lPk-i^s-l^l^  T  j 

+  (i^fc-i  Vfc_i.,  +  ^k-i  -  '(’fc-i)  ef ,  (3.7) 

and  premuliplying  (3.6)  by  gives 

y^-kPkPk,j  =^kR'k,jLk,jAkj  i(fe^fcj+ieJ+i  +  e-VkAfi>^  (3.8) 

^sk-\-j 

for  j  <  s  —  1. 

Now,  to  write  the  error  in  s-step  BICG  in  the  context  of  classical  BIGG,  we  must 
account  for  error  in  computation  of  the  s-step  bases.  Rearranging  the  finite  precision 
evaluation  of  (2.1),  we  obtain 


k,i-\-l 


-V 


,P 

k,i—'i- 


+  e6„ 


where  we  can  write  6„p  as 

"k.i 


lk.l  =  (^  +  2)|7l||<J+3|0.<, 


3|crjl 


kA  —  1  \ 


Since  we  generate  ^  by  the  same  recurrence,  we  also  have 


—(-^  +  2)1^  -I- 3 


3 


(JjV 


Letting  Ay,  =  •  ■  • ,  ,  0, 5)), 

precision  basis  computation  as 


r 

Vk,s-2  ’ 


0 


we  can  then  write  the  finite 


AVk  —  Y_kB_k  +  eAy, . 


(3.9) 


Using  (3.9),  we  can  write  (3.8)  as 


(AVfe  ^^Vk)Pk,j  ~  ^kRk,j^k,j^k,j  i^fc^fc,j  +  l®i  +  l  +  > 

^sk-\-j 

which  can  be  rearranged  to  obtain 

AVkPij  =  ykB!k,,Lk,,K]  -  -^vyk,,+ie^+i  +  .  +  eAy.P',^-.  (3.10) 

^sk+j 

Postmultiplying  (3.10)  by  Uk,j  gives 

dVfePfcjUfcj  =VkRf^jLkjAi,jUkj  —  — 

+  eiVkAn'^^Uk,j  +  Av,Pi,Uk,,),  (3.11) 

and  combining  (3.11)  and  (3.7),  we  obtain 


^VkRkJ  —^kRk,jPk,jRk,j^k,j  y^k'k'k,j  +  l^j  +  l  Psk^yk-lPk-l,s-l^l  (3-12) 

^sfc+Jf 

+  e  (AVkAp,^  +  H ^ Uk,j  +  AvkP'k,jUk,,)  (3.13) 

+  (Zfc-i  +  ^'k-i  ~  ^k-i)  ^1-  (3-14) 

Since 


PskAVk-lPk-l^s-l^l  —Psk{Y_k-lB-k-l  P  Avk_i)Pk-l,s-l^l 

=l3skVk-i  (  -^irk-i,s-i  -  r'k-i^s)  +  e^r' )  ef 

\(^sk—l  ’  / 

+  PskAvk.,p'k-i,s-ieI 

Psk  jr  /  T  Psk  /Tr  /  ir  ^  T 

= - Vk-irk-i^s-i^i - (Hrfe^o  -e0fc-i)ei 

ask-i  cusk-i 

+  ^PskVk-i5r'^^_^  +  /?sfcAvfc_jpJj_;^_g_;^ef , 

we  can  write  (3.14)  as 


l^sk 


1 


AVkR!k,,  =  VkR'k^Tk,,  -  -AP^Vk-iv'^  +  ^Akj, 


(^sk—1 


^sk-\-j 


where 


Afcj-  =AVkApi^  ,  +  AVk-i5pi^__^  ^ej’  +  VkAp<^  ,Uk,j  —  PskVk-i5r'i^_^  a_i®i'  (3.15) 

+  AvkP'k,,Uk,j  -  f3skAvk.,p'k-i,s-ie^  +  f^(<^^fc-i  -  ^Li)  -  ef- 

(3.16) 

Writing  Akj  =  [(5sfc)  •  •  •  >  Sgk+j],  we  have  that  the  {sk+j  +  1)*^  column  of  Akj  is 
^sk-\-j  —  AVk5pi^  _  pykSr'f,  .  Psk+jVk5r'^  ._^  +  Ay^  (3.17) 
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for  j  >  0,  and 


Ssk  Pskyk-l5r'^,_-^^^^_-^^  +  '^«fc,0  f^sk+j  ^Vk-iPk-l,s-l 

+  (A{(j>l_,-cj>l_^)-^(j>l^  (3.18) 

V  «sfc-l  / 

for  j  =  0. 

Using  the  inequalities  \l3sk+jp'k,j-i\  <  \p'k,]\  +  V'k,j\+0{t)  and  <  \r'k,j\  + 

la^fe+j-illiifcPfcj-il  +0(e),  we  can  bound  the  norm  of  the  columns  as 


|<5sfe+i|  <  +  6)|  A|  |l4|  +  (2s  +  8)|Ufc|  |i3j,|  + 


^sk-\-j  I 


^sk-\-j 
^sk-\-j  —  l 


+  (^2|A||y,|+(45  +  7)|Z,||^,|j|p;j, 

if  j  >  0.  For  the  j  =  0  case,  we  have 

\Ssk\<({N  +  2s  +  7)\A\ |Ufc_i I  +  (2s  +  8) |U,_i  | 


(3.19) 

(3.20) 


+ 


^sk 


(2s 

+  2)  , 

^sk 

«sfc-l 

^'^\Vk-i\\r'k-i,s 


+  (^{2N  +  4s  +  16)\A\\V,_,\  +  (6s  +  22)\V,_,\\B,_,\^  |pVi.«|  (3-21) 

We  can  thus  write  the  finite  precision  s-step  BIGG  recurrence  for  iterations  0 
through  sk  +  j  as 

AVkTi-i^j  =  ^kTT-t^jTk^j  —  —  V_krkj+iegk+j+i  +  ^^k,j,  (3.22) 

(^sk-\-j 


where  ^k,j  —  [^o,s— 15  ^i,s— i?  •  ■  ■  ? 

3.1.  Comments.  Note  that  we  can  write  n  iterations  of  finite  precision  classical 
BIGG  as  n  iterations  of  finite  precision  s-step  BIGG  with  s  >  n,  performed  in  the 
standard  basis.  That  is,  we  have  a  single  outer  loop  iteration  fc  =  0  and  j  =  n  inner 
loop  iterations,  with  Vq  =  In,  Bq  =  A,  R'o  n  =  Ro,n,  and  Pq  „  =  Po,n-  Now,  since 
Uq  =  In,  Avo  =  0,  and  since  k  =  0,  ^Vk-i  s  defined  to  be  zero. 

Plugging  in  to  (3.15),  we  get 


Ao,n  =  +  ^Ro,r,Bo,n, 

which  reproduces  the  error  term  (modulo  a  factor  of  2)  obtained  by  Tong  and  Ye  for 
finite  precision  classical  BIGG  [25]. 

Also  note  that  from  (3.15),  we  can  see  that  the  first  four  terms  on  the  right-hand 
side  correspond  to  the  two  terms  in  Tong  and  Ye’s  analysis  for  classical  BIGG,  and 
the  remaining  terms  correspond  to  the  error  in  computing  the  s-step  Krylov  bases 
and  the  change  of  basis  operation.  We  can  also  see  that  a  bound  on  the  size  of  the 
error  in  each  column  of  the  finite  precision  recurrence  depends  on  both  the  magnitude 
of  the  error  in  computing  the  s-step  Krylov  bases,  i.e.,  ||Av,,||,  as  well  as  the  size  of 
the  bases,  i.e.,  ||I4||. 
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3.2.  Diagonal  scaling.  As  in  [25],  it  will  be  more  convenient  to  work  with  a 
scaled  version  of  (3.22)  in  subsequent  sections.  Let  =  [zq.Oj  •  ■  •  >  Zk,j\  =  ^kT^'k  j^k] 
where 

Dk,j  =  diag(||Vbr(|_o||  ,.. . ,  ||Vbro,,_i||  ,  ||V"ir(  o||  ,. .. ,  ||l4r;.j||)  . 

We  can  then  write  the  scaled  version  of  (3.22)  as 


where  Tkj 

and 


kj  >k,j 


1  ]^k'^k,j  +  l  T 

ask+j  llroll 


(3.23) 


Vkr'kJl  ask+j/ llroll  =  a^k+j/ ||l^oro,o||  =  e^k+j+i'T'kj 


^k,j 


—  ^k,jD 


-1 

k,j’ 


4.  Bounds  on  ||rsfc+j+i|j  for  finite  precision  s-step  BICG.  In  this  subsec¬ 
tion,  we  upper  bound  the  norm  of  the  updated  residual  computed  in  iteration  sk  +  j 
of  s-step  BICG.  First,  we  will  review  a  series  of  Lemmas  proved  by  Tong  and  Ye  [25]. 
The  proofs  shown  below  are  nearly  identical  to  those  given  by  Tong  and  Ye  [25],  al¬ 
though  we  have  changed  the  notation  and  indexing  for  consistency  with  our  s-step 
formulation^ . 


Lemma  4.1.  Assume 

—  1  ^^k^ k  iA-l  T 

AZkj  =  ZkjTk,j  -  -z  lj— A 

o-sk+j  llroll 

with  tq  =  ||ro||  zq.  Then  for  any  polynomial  p{x)  =  of  degree  <  sk  + 

j  +  1, 

d(^)^0  =  Zkjp{'Tk,j)ei  +  Csk+jY-k'^kJ  +  l^ 
where  Csk+j  =  (-1)®'"+-’+ Vsfc+j+i/(ao  ' ' '  ask+j  ||ro||). 

Proof.  First,  we  will  prove  by  induction  that  for  1  <  z  <  sfc  -I-  j 

A^Zk,jei  =  Zk,j%ei.  (4.1) 

For  z  =  1,  we  have 

AZk+ei  =  ^ZkjTk,j  —  ^  ei  =  ZkjTk+ei. 

Now,  assume  (4.1)  holds  for  some  i  <  sk  +  j.  Then 
A^^^  Zkjei  =  A(A^  ZkpCi) 

—  ^i^kjTkpei) 

=  (  ZkjTkj  -  r^Yei 


^sk-\-j 


ro\ 

■i+i. 


—  ^kjTkjTkj^l  — 


^One  lemma  presented  is  slightly  different  than  what  appears  in  [25]  due  to  a  minor  mathematical 
error  that  we  correct. 
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where  we  have  used  the  fact  that  ^  when  i  <  sk  +  j.  Therefore  the 

inductive  hypothesis  holds.  Now  consider  the  case  i  =  sk  +  j .  We  then  have 


—  I  ZkjTkj 


^kr'kj+i 


iFoll 


^sfc+j  +  l 


Since  it  can  be  shown  that  eTt,_L.^i  =  t— 


and 


^sk-\-j 


VkK 


k,j 


'sk-\-j-\-l  ^  k,j 

ask+jl  Ikoll,  we  have 


Vkrij  (oo-'-asfc+j  Ikol 


N-l 


A^^+^+^Zk.ei  =  Zk.fk,iV':^^ei 


—  ^kjTh 


3  '  f^,3  'k,j 
•sfc+ji  +  1 


_1  \sfc+j  +  l 


J  'k,j 


ei 


(-1) 


Oo  •  •  •  CXsk+j  Ikoll 


VkT'k. 


j  +  1- 


The  lemma  follows.  □ 


We  now  use  this  result  in  proving  the  following  identity. 
Lemma  4.2.  Assume 


AZkj  —  Zkj7k,j 


1  — fc^fcj  +  l  X 

ask+j  Ikoll 


with  ro  =  Ikoll  zo  and  ask+j  =  a^k+j+iT^kj^'^-  Assume  that  TT"^  €  is  a 

matrix  such  that  W"'"Zkj  =  I  and  W'^V_f.r'f.  =  O^fc+j+i.  Then  for  any  polynomial 

p{x)  of  degree  not  exceeding  sk  +  j  with  p{0)  =  1,  we  have 


VkT'k, j+i  =  [l  -  AZk,,%-jW^)  piA)ro. 


Proof.  First,  we  multiply  by  Tf  jci  to  get 


1  VkT'k 


AZk,fnr^er  =  (  Zk,,Tk,,  -  )  T^^ei, 


which  allows  us  to  write 


^sk+j  1 1  roll 


VkTk,j+i  ^  _  ^Zk,,Tkjek. 


Ikoll 


Now,  let  p{x)  =  1  +  x(j){x),  with  (j){x)  =  X]i=o^  a  polynomial  of  degree  not 

exceeding  sk  +  j.  Then 

=zo-  AZk,jT^lei  +  ip{A)zo  -  p{A)zo) 

Ikoll 

=  “^^(41)2:0  —  AZk,jTi.  j  Cl  +  p{A)zo 
=  —AZk,j(f>(Tk,j)ei  —  AZk,jTi^  j  Cl  +  p(A)zo 

—  ~z^Zk,j{4>{'Tk,j)  +  'TkJ)ai  +  p{A)zo 

—  ~^Zk,j'TkJ  p{'Tk,j)ei  +  p{A)zo. 


(4.2) 
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By  Lemma  4.1,  recall  that 

p(^)^0  =  Zk,jP{'fkj)ei  +  Csk+j\^k''"k,j  +  li 
and,  multiplying  by  we  have 

W'^p{A)zo  =  W'^  {Zk^j p{Tk, j)ei  +  Csfc+^-Zfc^fc,j+i) 
P{'Tk,j  )^1  ; 

since  =  /  and  ^ =  Os^+j+i.  Now,  we  can  write 

=  -AZk,,%-JW^piA)zo  +  p{A)zo 
=  -  AZk,,%-jW^)  p{A)zo, 

and  substituting  zq  =  ^0/  Ikoll)  we  obtain 

=  [l  -  AZk,,%-jW^)  p{A)ro, 

which  gives  the  desired  result.  □ 


The  following  lemma  describes  the  construction  of  the  basis  W. 

Lemma  4.3.  Assume  that  zq,  . . . ,  Zsk+j+i  €  are  linearly  independent  and 
write  Zj^  j  —  [zq,  ■  •  ■ ,  ^sk+j\}  Z.k,j  —  \^k^j-}  ^sk+j -^-i]  •  Then  VLq  —  \Isk+j+\^  ^sk+j+i\Z-k^j 
has  the  property  Wq  Z/^  j  =  I  and  Wq  Zgk+j+i  =  O^fe+^+i.  Furthermore,  its  spectral 
norm  is  minimal  among  all  matrices  having  this  property. 

Proof.  By  the  definition  of  ITo,  Zji j  =  \A^o,wY'  for  some  w.  Since  we  assume 
zo, . . . ,  Zsk+j+i  are  linearly  independent, 

\WQ,w]^[Zkj,Zsk+j+i]  =  Zt,jZk,j  =  I- 
Then  ITq  Z]^  j  —  and  ITq 

Now,  assume  W  is  some  other  matrix  such  that  W'^Zkj  =  I  and  W'^ Zsk+j+i  = 
Os/c+j+i  hold.  Then  W  ^Zj^  j,Zsk+j+i\  —  [4sfc+j-i-i, O^fc+j+i]-  Thus,  W  Z_k.jZ.k^j  — 

[W,+i,0,fc+,+i]Z+^.  =  W^.  Hence  ||lTo||  <  ||W^||  •  <  l|W^II-  □ 

We  can  now  present  the  main  result. 

Theorem  4.4.  Assume  (3.23)  holds  and  let  Wq  =  [Isk+j+i,Osk+j+i]Z_k  j  € 
^(sk+j+i)xN^  jf  Zo, . . . ,  Zsk+j+1  are  linearly  independent,  then 

WVkr'kj+iW  <  {^  + Kk,j)  min  ||p(H  +  dHfej)ro||,  (4.3) 

pGf‘sk+j  +  l.piO)  =  l 


where  Kkj  =  |(^^fej  -  eAk,j)%JWo^^  and  SAkj  =  -eAk^jZ^  .. 

Proof.  Since  zq,  ... ,  Zsk+j+i  are  linearly  independent,  Z'^ ^Z^j  =  I.  Then  SAkj  = 
—eAk  jZ'^j  e  satisfies  SA^^  jZ^^j  =  — eA^  j.  Thus  (3.23)  can  be  written  as 


{A  +  SAl;  j)Zl;  j 


1  — fc^fc,j  +  l  T 

ask+j  Ikoll 


(4.4) 
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Then,  by  Lemma  4.2,  for  any  p  €  Psfc+j+i  with  p(0)  =  1,  we  obtain 

=  (^  ~  (^  +  ^^k,j)Zk,jTj.jWQ)  ■  p{A  +  SAkj)ro 

=  (/  -  iAZk,j  -  eAfe,,)7;-iVLo^)  •  p{A  +  dA^jW 

Thus,  we  can  bound  the  norm  of  the  left  hand  side  by 

||i^fe^fcj+i||  ^  +  Wi^Zkj  —  eAkj)Ti.  jWQ\\)  ■  ||p(A  +  dAfej)ro||. 

Since  this  holds  for  any  p(x)  with  p(0)  =  1,  the  inequality  is  true  for  the  minimizing 
polynomial,  which  leads  to  the  bound.  □ 

Note  that  Tkj  =  minpgp^j^^^.^^_p(o)=i  \\p{A  +  SAkj)ro\\  is  the  (sfc  +  residual 
norm  of  exact  GMRES  applied  to  the  perturbed  matrix  A  +  SAkj,  which  decreases 
monotonically  with  increasing  (sfc  +  j)- 

Since  we  have  Kkj  =  \\{AZkj  —  W(^|| ,  we  can  bound  as 

Kk,j  ^  (V sk  +  j  +  l||^||  +  ^ll  Afc j  II)  11^  j*”  II  •  II  Wd  II  • 

Then,  assuming  ||'^~j^||  and  ||lho||  are  bounded,  ||EfcrJ.  ||  is  on  the  order  0{Tkj). 
We  therefore  expect  convergence  of  the  s-step  BIGG  residual  when  Kkj  increases  at 
a  slower  rate  than  decreases,  for  all  values  of  k. 

Unfortunately,  as  in  the  BIGG  case,  we  can  not  determine  Kkj  a  priori,  although 
we  can  make  some  meaningful  observations  based  on  the  bound  in  (4.3).  Glearly,  the 
terms  eA^j  in  K^p  and 

Note  that  in  the  case  of  CG  (SPD  A),  we  have  ||Ufcr(, =  llrs^+j+iH^  = 
||e*j._|_j_i_]^  11^,  where  denotes  the  solution  error  =  x*  —  Xgk+j  for  true 

solution  X* .  Thus  in  this  case  Theorem  4.4  gives  a  bound  on  the  error  of  finite  precision 
s-step  CG.  It  remains  future  work  to  determine  under  what  conditions  ||e*^_|_^_i_]^  ||^  < 
Ilejfe+J^  for  s-step  CG. 

5.  The  linearly  dependent  case.  In  the  analysis  above,  we  assumed  linear 
independence  among  the  residual  vectors  (which  are  scalar  multiples  of  the  Lanczos 
vectors).  For  many  linear  systems,  however,  convergence  of  classical  BIGG  in  finite 
precision  is  still  observed  despite  numerical  rank  deficiency  of  the  basis.  In  [25]  it 
is  shown  how  the  residual  norm  can  be  bounded  absent  the  assumption  of  linear 
independence,  which  gives  insight  into  why  convergence  still  occurs  in  such  cases. 
We  will  now  prove  similar  bounds,  relaxing  the  constraint  that  Zq,  ■  ■  ■ ,  Zsk+j+i  € 
be  linearly  independent.  Again,  our  analysis  extends  that  of  Tong  and  Ye  [25]  for 
classical  BIGG. 

We  note  that  in  the  s-step  case,  there  are  two  potential  causes  of  a  rank-deficient 
basis.  Since  we  have  TZkp  =  '^kpTl-'k  ji  linear  dependence  can  occur  as  a  result  of  the 
finite  precision  Lanczos  process,  as  in  the  classical  method,  as  well  as  from  numerical 
rank  deficiencies  in  the  generated  s-step  polynomial  bases  14. 

Given  A  €  and  B  G  ,  AE  —  EB  =  Z  corresponds  to  the  linear  sys¬ 

tem  with  coefficient  matrix  A®Ij^i  —  If^®B.  This  system  has  a  unique  solution  if  and 
only  if  A(A)nA(i?)  =  0,  or,  equivalently,  ifsep(A,i?)  i=  ||(A0LAr^  —  GB)-i||  ^  > 

0,  which  depends  on  the  spectral  gap  of  A  and  B  (see  [9]). 

Theorem  5.1.  Assume  (3.23)  holds,  and  let  p  he  a  complex  number  such  that 
sep(A  —  pI,Tkp)  0.  Then 

||Zfc?'fcj+i||  <  Kkp  min  (||d(Tfc,i)||  +  Wpi^-  lkl)\\)  Ijr-oll, 

/9GPsfc+j  +  l,/0(0)  =  l 
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where 


Kk,j 


■\/sk  +  j  +  l(sep(A  —  ^I,Tk,j)  +  ImI)  +  e  ^fc,i 

sep(^  -  fJ.I,Tk,j) 


^•max(l,||/j(A-/x/)||  •  \\p{Tk,j)\\)  . 


Proof.  By  (3.23), 


{A  p,I)Zkj  —  ZkjTk,j  _  II  n  Sgk+j+i  A  ^^k,j  pZkj. 

0!sk+j  IkoII 

Then  since  sep(A  —  p,I,Tk,j)  >  0,  the  equation 

{A  pI'jEkJ  =  Ekjf~k,j  ^^k,j  A  pZkj 
has  a  unique  solution  Ekj  with 


p  A 


+  pZk, 


1 II  F 


^  sep(A  -  ^/,7fcj) 
Combining  (5.1)  and  (5.2),  we  can  write 


< 


^  ^k,j  p  A  \p\  ^/sk  +  j  +  l 
sep{A-  p,I,Tk,j) 


{A  pI){Zkj  A  Ekj)  —  {Zkj  A  Ekj)Tk,j  H  ’n  e^k+j+i- 

ctsk+j  Ip  oil 

Thus,  for  any  p  €  p(0)  =  1,  we  have,  by  (4.2), 


—k^kj+l 


(5.1) 


(5.2) 


Fol 


—  p(A  —  p,I){Zkj  A  Ekj)ei  —  {A  —  p,I){Zkj  A  Ek,j)Tk  j  piTk,j)^i, 


and  thus 


llroll 


<i\\ZkjA\\EkJ)\\p{A-pI)\\ 

+  \\A-pI\\  {\\ZkA  A  \\Ekj\\)  \\Tkj\\  \\piTk,j)\\  . 


Since 


ll-^fc.ill  A  ||i?fc,j||  ^  \/ sk  +  j  +  1  + 


^  ^k,j  p  A  \p\  y/sk+j  +  1 
sep(A  -  pil.Tk.j) 


we  obtain  the  desired  result.  □ 

Note  that  in  this  case,  if  p  is  such  that  sep{A  —  pi ,Tk,j)  is  large,  the  quantity 


Kkj  depends  heavily  on 


'k,j 


.  The  minimizing  polynomial  part  of  the  bound  now 


depends  on  both  p{Tk,j)  and  p{A  —  pi). 


6.  Extensions:  perturbation  theory.  We  can  think  of  (4.4)  as  an  exact  sub¬ 
space  relation  for  a  perturbed  A,  i.e.,  the  quantities  Vfc,  E-kj^  and  Tkj  produced  by 
the  finite  precision  s-step  BICG  algorithm  satisfy  an  exact  subspace  recurrence  (4.4) 
for  the  perturbed  system  A  +  6Akj.  This  means  that  the  eigenvalues  of  the  com¬ 
puted  matrix  Tkj  generated  by  the  s-step  algorithm  are  among  the  eigenvalues  of  the 
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perturbed  matrix  A  —  eA.kjTl^ .  In  the  next  theorem,  we  bound  the  distance  of 
these  eigenvalues  to  eigenvalues  of  unperturbed  matrix  A. 

Theorem  6.1.  Let  A  be  a  normal  nx  n  matrix  of  full  rank.  For  each  eigenvalue 
H  of  the  matrix  Tk,j  computed  by  the  finite  precision  s-step  (BI)CG  method,  there 
exists  an  eigenvalue  X  of  A  such  that 

|7-H  <e||Afc.,|y|7^'+JJV+||2  (6.1) 

Proof.  Note  that  Tk,j  =  Dk,jTk,jDk]  same  eigenvalues  as  7fcj.  By 

application  of  the  Bauer-Fike  theorem  [2]  to  (4.4),  there  exists  an  eigenvalue  of  7  of 
A  such  that 

|7-m|  <  ellAfcjZyiy  (6.2) 

We  can  then  write 

=\\^k,,D^^^Dk,,n'+V+\\^  (6.3) 

<\\^>^A2\\Kj\\2pkh  (6-4) 


□ 

The  right  hand  side  above  can  be  shown  to  depend  on  «:(I4)  and  k{R'i.j).  The 
above  theorem  means  that  the  Lanczos  vectors  computed  by  the  s-step  (BI)CG  algo¬ 
rithm,  VkP-'k  span  Krylov  spaces  of  a  matrix  within  e||  Vj!’||  of  A.  Similar 

observations  have  been  made  for  classical  Hnite  precision  Krylov  methods  [21,  29]. 

In  [21],  Paige  shows  that  for  classical  Lanczos  without  reorthogonalization,  the 
perturbed  matrix  is  very  close  to  A  until  a  Ritz  value  has  stabilized.  It  is  an  open 
question  whether  a  similar  result  (perhaps  with  additional  restrictions  on  Vk)  applies 
to  the  s-step  case. 
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